Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement Atlas WPWM and Z 13 TeV TOT #2207

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

comane
Copy link
Member

@comane comane commented Nov 9, 2024

Implementation agrees with legacy (t0-covmat included):

Z TOT Benchmarks

(this branch) https://vp.nnpdf.science/edlFDxSrRoK3stoHz-HtdQ==
(master) https://vp.nnpdf.science/JhZHlSMRQMigrmyn1OOcLg==

WPWM TOT Benchmarks

(this branch) https://vp.nnpdf.science/HM0ylMDJSvGocbHKjowGIQ==
(master) https://vp.nnpdf.science/MX4e1EQaQrafwPSwXihhTw==

@@ -0,0 +1,25 @@
bins:
- k1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the k1.

"""
kin = []

mw2 = 80.385**2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, put this as a module (or even at the filter_utils level) variable, MW2 = ... so that it can be modified for many datasets at once.

@scarlehoff
Copy link
Member

finally we add the lumi unc as MULT SPECIAL

Sorry, why "special"? What's wrong with ATLASLUMI13? That will correlate it with the other 13 TeV datasets.

@enocera
Copy link
Contributor

enocera commented Nov 27, 2024

Current implementation disagrees with legacy version.

@scarlehoff and @enocera I tried to reimplement this dataset and got some different covariance matrix. I tried to have a look at what the old implementation in build master was doing but failed to really understand. What I particularly failed to understand was the way in which the covariance matrix was constructed, see lines 93-98 of ATLAS_WZ_TOT_13TEV.cc in the old build master. Below I describe the way I implement the covariance matrix by following what I understood from the paper and extra sources

In this new implementation the covariance matrix is constructed in the following way:

* get correlation matrix for systematics from https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/STDM-2015-03/tabaux_03.pdf  -> corr matrix for systematics but not for lumi

* given the systematics in Table 3 of paper: https://arxiv.org/abs/1603.09222 we first build the stat covmat as


1. stat_cov = np.diag([stat_wm**2, stat_wp**2])

2. syst_cov = corr_matrix * np.outer(np.array([syst_wm, syst_wp]), np.array([syst_wm, syst_wp]))


* given the stat + sys cov we decompose it and write the artificial sys as CORR MULT

* finally we add the lumi unc as MULT SPECIAL

Dear @comane as far as I understand, with this procedure you do take into account the correlation between the W+ and W- cross sections, but you do not include the correlation with the Z cross section. My suggestion (and I think that this is what was done in the old buildmaster implementation) is to extend your procedure to Z like

syst_cov = corr_matrix * np.outer(np.array([syst_wm, syst_wp, syst_z]), np.array([syst_wm, syst_wp, syst_z]))

This will construct the total systematic covariance matrix for W+, W- and Z. Now, because CC and NC DY must go in different data sets (the reason being the way in which we correlate MHOUs), you need to generate artificial systematics from the total systematic covariance matrix and correlate these across the two data sets (one for W+- and one for Z). The decomposition into artificial systematics will give you three arrays with three systematic uncertainties each. If the order is W+, W- and Z, you put the two first arrays into the W+ and W- data set, and the thrid array into the Z data set. These uncertainties should have the same name (you choose which one) across the board. Then you attach the stat uncertainty to each data point (no need to construct a covmat) and the lumi uncertainty.

@comane comane force-pushed the reimplement_ATLAS_WPWM_13TEV_TOT branch from 956a26f to 5ecf049 Compare December 8, 2024 16:46
@comane comane changed the title [WIP] Reimplement Atlas WPWM 13 TeV TOT Reimplement Atlas WPWM and Z 13 TeV TOT Dec 8, 2024
@comane comane mentioned this pull request Dec 8, 2024
5 tasks
@comane comane requested a review from achiefa December 8, 2024 16:58
@comane
Copy link
Member Author

comane commented Dec 8, 2024

Current implementation disagrees with legacy version.
@scarlehoff and @enocera I tried to reimplement this dataset and got some different covariance matrix. I tried to have a look at what the old implementation in build master was doing but failed to really understand. What I particularly failed to understand was the way in which the covariance matrix was constructed, see lines 93-98 of ATLAS_WZ_TOT_13TEV.cc in the old build master. Below I describe the way I implement the covariance matrix by following what I understood from the paper and extra sources
In this new implementation the covariance matrix is constructed in the following way:

* get correlation matrix for systematics from https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/STDM-2015-03/tabaux_03.pdf  -> corr matrix for systematics but not for lumi

* given the systematics in Table 3 of paper: https://arxiv.org/abs/1603.09222 we first build the stat covmat as


1. stat_cov = np.diag([stat_wm**2, stat_wp**2])

2. syst_cov = corr_matrix * np.outer(np.array([syst_wm, syst_wp]), np.array([syst_wm, syst_wp]))


* given the stat + sys cov we decompose it and write the artificial sys as CORR MULT

* finally we add the lumi unc as MULT SPECIAL

Dear @comane as far as I understand, with this procedure you do take into account the correlation between the W+ and W- cross sections, but you do not include the correlation with the Z cross section. My suggestion (and I think that this is what was done in the old buildmaster implementation) is to extend your procedure to Z like

syst_cov = corr_matrix * np.outer(np.array([syst_wm, syst_wp, syst_z]), np.array([syst_wm, syst_wp, syst_z]))

This will construct the total systematic covariance matrix for W+, W- and Z. Now, because CC and NC DY must go in different data sets (the reason being the way in which we correlate MHOUs), you need to generate artificial systematics from the total systematic covariance matrix and correlate these across the two data sets (one for W+- and one for Z). The decomposition into artificial systematics will give you three arrays with three systematic uncertainties each. If the order is W+, W- and Z, you put the two first arrays into the W+ and W- data set, and the thrid array into the Z data set. These uncertainties should have the same name (you choose which one) across the board. Then you attach the stat uncertainty to each data point (no need to construct a covmat) and the lumi uncertainty.

@enocera thank you for the detailed explanation.

@comane comane requested a review from enocera December 8, 2024 16:59
@achiefa
Copy link
Contributor

achiefa commented Dec 9, 2024

Hi @comane, is this PR ready for review?

@comane
Copy link
Member Author

comane commented Dec 9, 2024

Hi @comane, is this PR ready for review?

Yes it is. As written in the PR comment, the new implementation agrees with the old one apart for the t0 covariance matrices.
The reason for the disagreement between these is that in the old implementation uncertainties were taken as MULT, however, by looking at the paper (see Table 3) it seems to me that they are given in additive form.

@achiefa
Copy link
Contributor

achiefa commented Dec 9, 2024

old implementation uncertainties were taken as MULT, however, by looking at the paper (see Table 3) it seems to me that they are given in additive form.

I see. However, I remember a similar situation where the uncertainties were nonetheless multiplicative. In other words, expressing the uncertainties in absolute value is not a sufficient condition for them to be additive. I might be wrong though, so I summon @enocera.

@comane
Copy link
Member Author

comane commented Dec 9, 2024

old implementation uncertainties were taken as MULT, however, by looking at the paper (see Table 3) it seems to me that they are given in additive form.

I see. However, I remember a similar situation where the uncertainties were nonetheless multiplicative. In other words, expressing the uncertainties in absolute value is not a sufficient condition for them to be additive. I might be wrong though, so I summon @enocera.

Yes, this might be the case. Perhaps @enocera remembers this detail!

Copy link
Contributor

@achiefa achiefa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I left a few comments and some questions that we can discuss.


yaml.add_representer(float, prettify_float)

MZ2 = 91.1876**2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I start wondering whether we should use a common source for these parameters. I usually take min and max values of the mass as indicated in the paper, and then take the mean value. Honestly, I don't know what is better. It's not a big issue though, as this is used only for the x,Q2 map. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I literally just took the value that was being used in the legacy version.

But this is a good point. I agree with you that it would be nice to collect this values / constants in a file (eg filter_utils) so as to use them in a consistent way..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I thought I had already replied to this.
I think its a good idea in principle.
Here I really just took the value that was already used in the legacy version for the kinematics

@comane comane force-pushed the reimplement_ATLAS_WPWM_13TEV_TOT branch from bec580f to bebefea Compare December 9, 2024 17:38
@enocera
Copy link
Contributor

enocera commented Dec 11, 2024

old implementation uncertainties were taken as MULT, however, by looking at the paper (see Table 3) it seems to me that they are given in additive form.

I see. However, I remember a similar situation where the uncertainties were nonetheless multiplicative. In other words, expressing the uncertainties in absolute value is not a sufficient condition for them to be additive. I might be wrong though, so I summon @enocera.

Yes, this might be the case. Perhaps @enocera remembers this detail!

@achiefa @comane

General considerations. The way in which an experimental uncertainty is presented (absolute or percentage) is not an indication of its additive or multiplicative nature. Additive uncertainties can be presented as absolute or percentage values, not necessarily only as absolute values; and likewise multiplicative uncertainties can be presented as absolute or percentage values, not necessarily only as percentage values. If you are undecided whether an uncertainty is additive or multiplicative, the NNPDF convention is to set it to multiplicative. The reason being that it is worse to treat as additive an uncertainty which is actually multiplicative than the other way round because of the D'Agostini bias. If you have artificial systematic uncertainties, determined from the decomposition of a covariance matrix, these must instead be additive, because otherwise the original covariance matrix cannot be reproduced because of the t0 prescription.

Specific considerations. As far as I understand, in the data set under discussion you have three uncertainties: the statistical uncertainty (stat) a correlated systematic uncertainty (sys_1) and the luminosity uncertainty (sys_2). The correlated systematic uncertainty must be used with the correlation matrix to generate a covariance matrix, which is decomposed into additive artificial systematic uncertainties, as I explained above; the luminosity uncertainty should be implemented separately, treated as multiplicative (100%) correlated. I understand that this is what you did.

@comane
Copy link
Member Author

comane commented Dec 11, 2024

Thanks @enocera and @achiefa,
I am now treating the sys as: ADD (stat), MULT (lumi), ADD (sys).

The dataset agrees in all (t0-included) with the previous implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants